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Abstract 

The role of the normalized modularity matrix in finding homogeneous cuts will be 
presented. We also discuss the testability of the structural eigenvalues and that 
of the subspace spanned by the corresponding eigenvectors of this matrix. In the 
presence of a spectral gap between the k — 1 largest absolute value eigenvalues and 
the remainder of the spectrum, this in turn implies the testability of the sum of 
the inner variances of the k clusters that are obtained by applying the /c-means 
algorithm for the appropriately chosen vertex representatives. 
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1 Introduction 



The purpose of this paper is to summarize the spectral properties and testa- 
bility of the spectrum and spectral subspaces of the normalized modularity 
matrix introduced in [9] to find regular vertex partitions. We will generalize 
the Laplacian based spectral clustering methods to recover so-called volume 
regular cluster pairs such that the information flow between the pairs and 
within the clusters is as homogeneous as possible. For this purpose, we take 
into consideration both ends of the normalized Laplacian spectrum, i.e., large 
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absolute value, so-called structural eigenvalues of our normalized modularity 
matrix introduced just for this convenience. 

In Theorem 3, we estimate the constant of volume regularity in terms of the 
gap between the structural and other eigenvalues, and the k- variance of the op- 
timal vertex representatives constructed by the eigenvectors corresponding to 
the structural eigenvalues. Here we give a more detailed proof of this statement 
than in [10]. This theorem implies that for a general edge- weighted graph, the 
existence of k — 1 structural eigenvalues of the normalized modularity matrix, 
separated from 0, is indication of a A;-cluster structure such that the cluster- 
pairs are volume regular with constant depending on the spectral gap and the 
above /c-variance. The clusters themselves can be recovered by applying the 
fc-means algorithm for the vertex representatives. Hence, Theorem 3 implies 
that spectral clustering of the vertices into k parts gives satisfactory partition 
in the sense of volume regularity. 

Furthermore, in Theorems 8 and 10, we prove the testability of the structural 
eigenvalues and the corresponding eigen-subspace of the normalized modular- 
ity matrix in the sense of [12]. In view of this, spectral clustering methods 
can be performed on a smaller part of the underlying graph and give good 
approximation for the cluster structure. 



2 Preliminaries 

Throughout the paper, we use the general framework of an edge-weighted 
graph. Let G = G n = (V, W) be an edge- weighted graph on vertex-set V 
(\V\ = n) and n x n symmetric weight-matrix W of non- negative real entries 
and zero diagonal. We will call the numbers d{ = YJj=\Wij (i = l,...,n) 
generalized degrees, and the diagonal matrix D = diag (d±, . . . , d n ) degree 
matrix. In this and the next section, without loss of generality, Vol(V) = 1 
will be assumed, where the volume of the vertex-subset U C V is Vol(C/) = 
In the sequel, we only consider connected graphs, which means that 
W is irreducible. 

In [9] , we defined the normalized version of the modularity matrix (introduced 
in [21]) as M D = D ^WD 1 ^ _ y/dy/d T \ wne re = (y/a\, . . . , y/cQ T , 
and we called it normalized modularity matrix. The spectrum of this matrix is 
in the [-1,1] interval, and is always an eigenvalue with unit-norm eigenvector 
\fd. Indeed, in [5] we proved that 1 is a single eigenvalue of D~ 1/,2 WD -1 / 2 with 
corresponding unit-norm eigenvector v^d, provided our graph is connected. 
This becomes a zero eigenvalue of with the same eigenvector, whence 1 
cannot be an eigenvalue of M D if G is connected. In fact, the introduction 
of this matrix is rather technical, the spectral gap, further, Lemma 1 and 
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Theorem 3 can better be formulated with it. It can also be obtained from the 
normalized Laplacian by subtracting it from the identity and depriving of its 
trivial factor. Normalized Laplacian was used for spectral clustering in several 
papers (e.g., [3,5,6,14,20]), the idea of which can be summarized by means of 
the spectral decomposition of the normalized modularity matrix. We introduce 
the following notation: the weighted cut between the vertex-subsets X, Y C V 
is w(X, Y) = J2iex J2jeY w ij- We will frequently refer to the following facts. 

(a) The spectral decomposition of solves the following quadratic placement 
problem. For a given positive integer k (1 < k < n), we want to minimize 
Qk = J2i<j *%|| r i — Tj\\ 2 on the conditions 

n n 

1=1 1=1 

where the vectors ri, . . . , r n are (k — l)-dimensional representatives of the 
vertices, which form the row vectors of the nx(k-l) matrix X. Denote the 
eigenvalues of M D , in decreasing order, by 1 > A x > ■ ■ • > A n > — 1 with cor- 
responding unit-norm, pairwise orthogonal eigenvectors Ui,...,u„. In [5], 
we proved that the minimum of Q k subject to (1) is k — 1 — X^=i K and is at- 
tained by the representation such that the optimum vertex representatives 
r\, . . . , r* are row vectors of the matrix X* = (D~ 1 / 2 Ui, . . . , D -1 / 2 u fc _i). 
Instead of X, the augmented n x k matrix X can as well be used, which 
is obtained from X by inserting the column x = 1 of all l's. In fact, 
Xo = D _1 / 2 Uo, where Uo = v^d is the eigenvector corresponding to the 
eigenvalue 1 of D^^WD 1 / 2 . Then 

Q k = tr (D 1 / 2 X) T (I n - D- 1 / 2 WD- 1 / 2 )(D 1 / 2 X), 

and minimizing Qj, on the constraint (1) is equivalent to minimizing the 
above expression subject to X T DX = l k . This problem is the continuous 
relaxation of minimizing 

Q k {P k ) = tr (D 1 / 2 X(P fe )) T (I n - D- 1 / 2 WD- 1 / 2 )(D 1 / 2 X(P fe )) 

over the set of /c-partitions P k — (Vi, . . . , V k ) of the vertices such that P k is 
planted into X in the way that the columns of X(P fe ) are so-called normal- 
ized partition-vectors belonging to P k . Namely, the coordinates of the ith. 
column are zeros, except those indexing vertices of Vi, which are equal to 
, 1 (i — 1, . . . , k). In fact, this is the normalized cut problem, which is 

discussed in [20] for k = 2, further, in [3] and [6] for a general k, and the 
solution is based on the above continuous relaxation. 

(b) Now, let us maximize the normalized Newman-Girvan modularity of G in- 
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duced by P k , defined in [9] as 



over the set Vk of the /^-partitions of V. It is easy to see that M k (P k ) = 
k — 1 — Q k (P k ), and hence, the above task has the same spectral relaxation 
as the normalized cut problem. Let M k = maxp k£ p k M k (P k ) denote the 
maximum /c-way normalized Newman- Girvan modularity of the weighted 
graph G. 

(c) Finally, from the above considerations it is straightforward that M k < 
Yli=i K, or equivalently, the minimum normalized A;- way cut is at least the 
the sum of the k — 1 smallest positive normalized Laplacian eigenvalues. 
As for the minimum normalized fc-way cut, in [6] we also gave an upper 
estimate by constant times the sum of the k — 1 smallest positive normalized 
Laplacian eigenvalues, which constant depends on the so-called k-variance 
of the vertex representatives defined in the following way. 

S fe 2 (X) = mm Sl(X, P k ) = min f £ d s ||r, - cj 2 (2) 

PnePk Pk=(Vi,-,v k ) a=ljGVa 

where c a = Vo w V x J2jev a ^j r j i s the weighted center of cluster V a and ri, 
. . . ,r n G M fe_1 are rows of X. (The augmented X would give the same k- 
variance.) The constant of our estimation depended on S*! (X*), and it was 
close to 1 if this k- variance of the optimum (k — l)-dimensional vertex repre- 
sentatives was small enough. Note that S k (K,P k ) is the objective function 
of the weighted A;-means algorithm. 

In this way, we showed that large positive eigenvalues of the normalized modu- 
larity matrix are responsible for clusters with high intra- and low inter-cluster 
densities. Likewise, maximizing Q k {Pk) instead of minimizing over V k , small 
negative eigenvalues of the normalized modularity matrix are responsible for 
clusters with low intra- and high inter-cluster densities (see [9]). Our idea is 
that taking into account eigenvalues from both ends of the normalized modu- 
larity spectrum, we can recover so-called regular cluster pairs. For this purpose, 
we use the notion of volume regularity to be introduced in the next section. 



3 Normalized modularity and volume regularity 



With the normalized modularity matrix, the well-known Expander Mixing 
Lemma (for simple graphs see, e.g., [17]) is formulated for edge- weighted 
graphs in the following way (see [8]). 
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Lemma 1 Provided Vol(V) = 1, for all X, Y C V, 

HX,F) - Vol(X)Vol(F)| < ||M D || • y / Vol(X)Vol(F), 

where ||M_d|| denotes the spectral norm of the normalized modularity matrix 
ofG=(V,W). 

Since the spectral gap of G is 1 — ||M C ||, a large spectral gap indicates small 
discrepancy as a quasi-random property discussed in [15]. If there is a gap not 
at the ends of the spectrum, we want to partition the vertices into clusters so 
that a relation similar to the above property for the edge-densities between the 
cluster pairs would hold. For this purpose, we use a slightly modified version 
of the volume regularity's notion introduced in [2]. 

Definition 2 Let G = (V, W) be an edge-weighted graph with Vol(V) = 1. 
The disjoint pair A, B C V is a-volume regular if for all X C A, Y C B we 
have 

\w(X, Y) - p(A, B)Vol{X)Vol(Y)\ < cty/vol(A)Vol(B), 
where p(A, B) = Vol ^j(^ai(B) ^ s ^ e re ^ a ^ ve inter-cluster density of(A,B). 

In the ideal A;-cluster case, let us consider the following generalized random 
simple graph model: given the partition (Vi, . . . , 14) of V (\V\ = n), vertices 
i G V a and j G H are connected with probability p a b, independently of each 
other, 1 < a, b < k. We can think of the probability p a b as the inter-cluster 
density of the pair (V a , V&). Since generalized random graphs can be viewed 
as edge-weighted graphs with a special block-structure burdened with random 
noise, based on [7], we are able to give the following spectral characterization 
of them. Fixing k, and tending with n to infinity in such a way that the cluster 
sizes grow at the same rate, there exists a positive number 6 < 1, independent 
of n, such that for every < r < 1/2 there are exactly k — 1 eigenvalues 
of Mo greater than 9 — n~ T , while all the others are at most n~ T in abso- 
lute value. Further, the /c-variance of the vertex representatives constructed 
by the k — 1 transformed structural eigenvectors is 0(n~ 2r ), and the cluster 
pairs are a-volume regular with any small a, almost surely. Note that gen- 
eralized quasirandom graphs defined in [18] are deterministic counterparts of 
generalized random graphs with the same spectral properties. 

Theorem 3 Let G = (V, W) be a connected edge-weighted graph on n ver- 
tices, with generalized degrees di,...,d n and degree matrix D. Assume that 
Vol(V) = 1, and there are no dominant vertices, i.e., di = Q(l/n), i = 
l,...,n, as n — >■ oo. Let the eigenvalues of~Mo, enumerated in decreasing 
absolute values, be 

1 > | A*x | > • • • > | A*fe— x | > £ > W\ >•••> \fJ>n\ = 0. 
The partition (Vi, . . . , 14) of V is defined so that it minimizes the weighted 
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k-variance S%(X*) of the optimum vertex representatives - defined in (2) - 
obtained as row vectors of the nx(k — l) matrix X* of column vectors D~ 1/,2 Uj, 
where Uj is the unit-norm eigenvector corresponding to Hi (i — 1, . . . , k — 1). 
Assume that there is a constant < K < | such that \Vi\ > Kn, i — 1, . . . , k. 

With the notation s = \J Sf (X*) ; the (Vi,Vj) pairs are 0(\/2ks + e)-volume 
regular (i ^ j) and for the clusters Vi (i — 1, . . . , k) the following holds: for 
all X, Y C Vi, 

\w(X,Y) - p(Vi)Vol(X)Vol(Y)\ = 0(V2ks + e)Nol(yi), 
where p(Vi) = v^y'] is the relative intra-cluster density ofVi. 

Note that, in Section 2, we indexed the eigenvalues of ~M.d i n non- increasing 
order and denoted them by A's. The set of all Aj's is the same as that of all 
/ij's. Nonetheless, we need a different notation for the eigenvalues indexed in 
decreasing order of their absolute values. Recall that 1 cannot be an eigenvalue 
of Ms if G is connected. Consequently, \p,\\ = 1 can be if and only if /xi = — 1, 
i.e., if G is bipartite. For example, if the conditions of the above theorem hold 
with k = 2 and pi = —1 < e, i > 2), then our graph is a bipartite 
expander discussed in [1] in details. 

For the proof we need the definition of the cut norm of a matrix (see e.g., [16]) 
and the relation between it and the spectral norm. 

Definition 4 The cut norm of the real matrix A with row-set Row and column- 
set Col is 

■ieRjeC 

Lemma 5 For the m x n real matrix A, 

||A||n < y/mn\\ A||, 

where the right hand side contains the spectral norm, i.e. the largest singular 
value of A. 



AIL = max 

RcRow,CcCol 



PROOF. 



I A II n = m &x |x Ay| = 

xe{o,i} m ,ye{o,i} n 

< Jinn max |x T Ay| 

||x|| = l,||y|| = l' 



max 
xe{o,i} m ,y6{o,i} r 

= \/mn\\ A||, 



^) T A(A) 



since for x G {0, l} m , ||x|| < y/m, and for y G {0, l} n , ||y|| < y/n. □ 
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The definition of the cut norm and the result of the above lemma naturally 
extends to symmetric matrices with m = n, the spectral norm of which is the 
absolute value of the maximum absolute value eigenvalue. 

PROOF. (Theorem 3). Recall that the spectrum of D^WD" 1 / 2 differs 
from that of Md only in the following: it contains the eigenvalue /j,q — 1 with 
corresponding unit-norm eigenvector u = Vd instead of the eigenvalue of 
M D with the same eigenvector. If G is connected, 1 is a single eigenvalue. The 
optimum (k — l)-dimensional representatives of the vertices are row vectors of 
the matrix X* = (x*, . . . , x^J, where x* = D _1 / 2 Uj (i — 1, . . . , k — 1). The 
representatives can as well be regarded as k- dimensional ones, as inserting the 
vector Xq = D _1 / 2 u = 1 will not change the k- variance s 2 = S%(X.*). Assume 
that the minimum k- variance is attained on the ^-partition (Vi, . . . , 14) of the 
vertices. By an easy analysis of variance argument (see [5]) it follows that 

s 2 = ]Tdist 2 KF), (3) 

i=0 

where F = Span {D 1 / 2 z 1 , . . . , D 1/,2 z fc } with the so-called normalized partition 
vectors Zi,...,Zt. of coordinates zu = —p^ if 7 G V; and 0, otherwise 

(i = 1, . . . , k). Note that the vectors D 1 / 2 z 1 , . . . , D x / 2 Zfc form an orthonormal 
system. By considerations proved in [5], we can find another orthonormal 
system v , . . . , Vfc_i e F such that 

fc-i 

s 2 <EK- v *ll 2 < 2s2 ( 4 ) 

i=0 

(v = u , since u G F). We approximate the matrix D~ 1 / 2 WD~ 1 / 2 = 
YhZq A*i u i u f by the rank k matrix X)f=o A*i v i v f with the following accuracy 
(in spectral norm): 



n-1 


fc-1 


fc-1 






n-1 






< s 1^1 ■ 


T T 


+ 




i=0 


i=0 


i=0 






i=k 



which can be estimated from above with J2iZo sinaj + e < J2iZo \\ u i~ v i|l + £ ^ 
v^fcs+e, where a« is the angle between Uj and Vj, and for it, sin ^ = Vj|| 
holds, i = 0, . . . , k — 1. 

Based on these considerations and relation between the cut norm and the 
spectral norm (see Lemma 5), the densities to be estimated in the defining 
formula of volume regularity can be written in terms of stepwise constant 
vectors in the following way. The vectors := D~ 1 / 2 v i are stepwise constants 
on the partition (Vi, . . . , 14), i = 0, . . . , k — 1. The matrix X)f=o ^iYiyf ^ s 
therefore a symmetric block-matrix on k x k blocks belonging to the above 
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partition of the vertices. Let w ab denote its entries in the (a, b) block (a, b = 
1, . . . , k). Using (5), the rank k approximation of the matrix W is performed 
with the following accuracy of the perturbation E: 



I El 



fc-i 



W - D(]T /wf)D 

i=0 



k-l 



D l/2 (D -l/2 WD -l/2 _^ ftViV T )D l/2 



i=0 



Therefore, the entries of W - for i e V a , j G V b - can be decomposed as 
Wij = didjW a b + where the cut norm of the n x n symmetric error matrix 
E = (rjij) restricted to V a x V b (otherwise it contains entries all zeros) and 
denoted by E ab , is estimated as follows: 



|E a6 || n < n||E a5 || < n ■ ||Dy 2 || • (V2ks + e) • ||D 
< n 



1/2 | 

b 



Vol(K) 



\V n 



\ 



Cl ^.(V2ks + e) 



\V h 



Cl 



n 



n 

Wb\ 



y/vol(V a )y/vol(V b )(y/2ks + e) 



< ci • - A /Vol(K)V Vol (H)(V2A; S + e) 
= cy/vol(V a )y/vol(V b )(V2ks + e). 

Here the diagonal matrix D a contains the diagonal part of D restricted to V a , 
otherwise zeros, and the constant c does not depend on n. Consequently, for 
a, b = 1, . . . , k and X C V a , Y C V b : 



\w(X,Y)-p{V a ,V b )Vol(X)Vol(Y)\ = 

srsrtjj- j_ ^ voi(x)voi(r) A 



ieXjeY 



Vol(K)Vol(H) 



v v voi(x)voi(r) v v 



ieXjeY 



Voi(K)Voi(H) 



< 2c(V2ks + £WVol(K)Vol(H), 



that gives the required statement both in the a ^ b and a = b case. □ 



Note that in the k = 2 special case, due to a theorem proved in [5], the 
2-variance of the optimum 1-dimensional representatives can be directly esti- 
mated from above by the gap between the two largest absolute value eigen- 
values of Mo, and hence, the statement of Theorem 3 simplifies, see [8]. For 
a general k, we can make the following considerations. 
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Assume that the normalized modularity spectrum (with decreasing absolute 
values) of G — (V, W) satisfies 

1 > M > • • • > l^fe-il > 9 > e > \fi k \ >••-> | /in | = 0. 

Our purpose is to estimate s with the gap 5 := 9 — e. We will use the notation 
of the proof of Theorem 3 and apply the results of [4] for the perturbation of 
spectral subspaces of the symmetric matrices 

n-1 fe-1 

A = /iiUjuf and B = ^ Wivf 

i=0 i=0 

in the following situation. The subsets Si = . . . , /i n _i} and 5*2 = {/io, • • • , /ife-i} 
of the eigenvalues of D -1/,2 WD -1 / 2 are separated by an annulus, where dist(Si, S2) = 
5 > 0. Denote by Pa and Pb the projections onto the spectral subspaces of 
A and B spanned by the eigenvectors corresponding to the eigenvalues in Si 
and S 2 , respectively: 

P A (5 1 ) = Eu,< ) P B (S 2 ) = X>vf. 

j=k i=0 

Then Theorem VII.3.4 of [4] implies that 

||PaPb||f<^||Pa(A-B)P b || f , (6) 

where \\.\\f denotes the Frobenius norm. On the left hand side, ||PaP_b||f = 

sin 2 «i, and in view of ||uj — Vj|| = 2sin^ and (4), this is between 
and s. On the right hand side, 

fe-1 n-1 

P A AP B -P A BP B = (P A A)P B -P A (P i? B) = ]T E( /Uj -^)uJ(u,-v,)u J vf, 

i=0 j=k 

where the Frobenius norm of the rank 1 matrices u^-vf is 1, and the inner 
product uj(uj — Vj) is the smaller if the Uj's and the Vj's are the closer {i = 
1, . . . , k — 1). Therefore, by the inequality (6), s is the smaller if S is the larger 
and the — differences for % — 0, . . . , k — 1; j = k, . . . , n — 1 are closer 
to 5. If \/ik\ = £ is small, then . . . , |/ifc_i| should be close to each other 
(/io = 1 does not play an important role because of u = v ). 



4 Testability of the normalized modularity spectrum and eigen- 
subspaces 

Authors of [12] defined the testability of simple graph parameters and proved 
equivalent notions of this testability. They also anticipated that their results 



9 



remain valid if they consider weighted graph sequences (G n ) with edge- weights 
in the [0,1] interval and no dominant vertex-weights aii(G n ) > (i — 1, . . . , n), 
i.e., maxj a ^ Gn ^ — > as n — > oo, where a Gn = Z)2=i a i(G n ). To this end, in [11], 
we slightly modified the definition of a testable graph parameter for weighted 
graphs in the following way. 

Definition 6 A weighted graph parameter f is testable if for every e > there 
is a positive integer m < n such that if G n satisfies max; a ^ Gn "> < then 



where r)(m, G n ) is a random simple graph on m vertices selected randomly from 
G n in the following manner: m vertices of G n are selected with replacement, 
with respective probabilities proportional to the vertex-weights; given the se- 
lected vertex-subset, the edges come into existence conditionally independently, 
with probabilities of the edge-weights. 

By the above definition, a testable weighted graph parameter can be consis- 
tently estimated based on a fairly large sample. Based on the results of [12] for 
simple graphs, in [11], we established equivalent statements of this testability, 
from among which we will use the following. 

Fact 7 Let f be a testable weighted graph parameter. Then for every conver- 
gent weighted graph sequence (G n ), with no dominant vertex-weights, f(G n ) 
is also convergent as n — > oo. 

The notion of the convergence of a weighted graph sequence is defined in [12], 
where the authors also describe the limit object as a symmetric, measurable 
function W : [0, 1] x [0, 1] — > [0, 1], called graphon. The so-called cut distance 
between the graphons W and U is 5o(W, U) = inf u \\W — U v \\u, where the cut 
norm of the graphon W is defined by 



and the above infimum is taken over all measure preserving bijections v : 
[0, 1] — > [0,1], while U u denotes the transformed U after performing the same 
measure preserving bijection v on both sides of the unit square. Graphons are 
considered modulo measure preserving maps, and under graphon the whole 
equivalence class is understood. In this way, to a convergent weighted graph 
sequence (G n ), there is a unique limit graphon W such that 5\j(G n ,W) — > 
as n — > oo, where 5u(G n , W) is defined as 5n(W / G„ ) W) with the step-function 
graphon Wc n assigned to G n in the following way: the sides of the unit square 
are divided into intervals ii, . . . , I n of lengths ai(G n )/a Gn , a n (G n )/a Gn , 
and over the rectangle Jj x Ij the stepfunction takes on the value Wij(G n ). 



n 



n\f(G n )-f(r)(rn,G n ))\>e)<e, 
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In [11], we proved the testability of some normalized and unnormalized bal- 
anced multiway cut densities such that we imposed balancing conditions on 
the cluster volumes. Under similar conditions, for fixed number of clusters 
k, the unnormalized and normalized multiway cuts and modularities are also 
testable, provided our edge-weighted graph has no dominant vertices. The 
proofs rely on statistical physics notions of [13], utilizing the fact that the 
graph convergence implies the convergence of the ground state energy (min- 
imum of the energy function over the set of /c-partitions of vertices). In [22], 
the authors showed that the Newman-Girvan modularity is an energy function 
(Hamiltonian), and hence, testability of the maximum/minimum normalized 
modularities, under appropriate balancing conditions, can be shown analo- 
gously. Here we rather discuss the testability of spectra and A;-variances, be- 
cause in spectral clustering methods these provide us with polynomial time 
algorithms, though only approximate solutions are obtained as analyzed in 
Section 2. 

In Theorem 6.6 of [13], the authors prove that the normalized spectrum of a 
convergent graph sequence also converges in the following sense. Let W be a 
graphon and (G n ) be a sequence of weighted graphs with uniformly bounded 
edge-weights tending to W. (For simplicity, we assume that |y(C n )| = n). 
Let |A„ 5 i| > |A„ i2 | > • • • > \^ n ,n\ be the adjacency eigenvalues of G n indexed 
by their decreasing absolute values, and let fi n> i = \ n ,i/n (i = l,...,n) be 
the normalized eigenvalues. Further, let Tw be the L 2 [0, 1] — > L 2 [0, 1] integral 
operator corresponding to W: 



It is well-known that his operator is self-adjoint and compact, and hence, it 
has a discrete real spectrum, whose only possible point of accumulation is the 
0. Let Hi(W) denote the ith largest absolute value eigenvalue of T w . Then for 
every i > 1, /i nti — > /ii(W) as n — > oo. In fact, the authors prove a bit more 
(see Theorem 6.7 of [13]): if a sequence W n of uniformly bounded graphons 
converges to a graphon W, then for every i > 1, fii(W n ) — > /J>i(W) asn-} oo. 
Note that the spectrum of Wq is the normalized spectrum of G, together with 
countably infinitely many 0's. Therefore, the convergence of the spectrum of 
(G n ) is the consequence of that of (Wa n )- 

We will prove that in the absence of dominant vertices, the normalized mod- 
ularity spectrum is testable. To this end, both the modularity matrix and the 
graphon are related to kernels of special integral operators, described herein. 
Let (£,£') be a pair of identically distributed real- valued random variables 
defined over the product space X x X having a symmetric joint distribution 
W with equal margins P. Assume that the dependence between £ and £' is 
regular, i.e., their joint distribution W is absolutely continuous with respect 
to the product measure P x P, and let w denote its Radon-Nikodym deriva- 
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tive, see [23]. Let H = L 2 (£) and H' = L 2 {^') be the Hilbert spaces of random 
variables which are functions of £ and £' and have zero expectation and finite 
variance with respect to P. Observe that H and H' are isomorphic Hilbert 
spaces with the covariance as inner product; further, they are embedded as 
subspaces into the L 2 -space defined similarly over the product space. (Here 
H and H' are also isomorphic in the sense that for any ip G H there exists a 
i])' G H' and vice versa, such that ip and ip' are identically distributed.) 

Consider the linear operator taking conditional expectation between H' and 
H with respect to the joint distribution. It is an integral operator and will be 
denoted by Pw : H' — > H as it is a projection restricted to H' and projects 
onto H. To ip' G H' the operator P w assigns ip G H such that ip = E w (^' | £), 
i.e., 

ip(x) — / w(x,y)^'(y)¥(dy), x G X. 
jy 

If 

w 2 (x,y)P(rfx)P(rfy) < oo, 

then Pw is a Hilbert-Schmidt operator, therefore compact and has spectral 
decomposition 

oo 

Pw = ^2\i(.,^H>A, 

i=l 

where for the eigenvalues |A«| < 1 holds and the eigenvalue-eigenfunction 
equation looks like 

P w ^- = (i = 1, 2, ...), 

where ^ and ip[ are identically distributed, whereas their joint distribution 
is W. It is easy to see that Pw is self-adjoint and it takes the constantly 1 
random variable of H' into the constantly 1 random variable of H; however, 
the ipo — l^o = 1 P a i r i s n °t regarded as a function pair with eigenvalue 
A = 1, since they have no zero expectation. More precisely, the kernel is 
reduced to w(x,y) — 1. 

Theorem 8 Let G n = (V n , W n ) be the general entry of a convergent sequence 
of connected edge-weighted graphs whose edge-weights are in [0,1] and the 
vertex- weights are the generalized degrees. Assume that there are no dominant 
vertices. Let W denote the limit graphon of the sequence (G n ), and let 

1 > ^ lA*n,2| > • • • > \Vn,n\ = 

be the normalized modularity spectrum of G n ( the eigenvalues are indexed by 
their decreasing absolute values). Further, let /ij(P w ) is the ith largest ab- 
solute value eigenvalue of the integral operator Pw : L 2 {^') — > L 2 {^) taking 
conditional expectation with respect to the joint measure W embodied by the 
normalized limit graphon W , and £, £' are identically distributed random vari- 
ables with the marginal distribution of their symmetric joint distribution W. 



/ / 

Jx Jx 
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Then for every i > 1, 



Hn,i -> fJ>i(Pw) as n -> OO- 



PROOF. In case of a finite (vertex set) we have a weighted graph, and 
we will show that the operator taking conditional expectation with respect 
to the joint distribution determined by the edge-weights corresponds to its 
normalized modularity matrix. 

Indeed, let X — V, \V\ = n, and G n = (V, W) be an edge-weighted graph on 
the n x n weight matrix of the edges W with entries Wi/s; now, they do not 
necessarily sum up to 1. (For the time being, n is kept fixed, so - for the sake 
of simplicity - we do not denote the dependence of W on n). Let the vertices 
be also weighted with special weights ai(G n ) := Z)" =1 W^, % — 1, . . . , n. Then 
the step-function graphon Wa n is such that Wa n (x,y) = whenever x e U 
and y G Ij, where the (not necessarily contiguous) intervals ii, . . . , I n form a 
partition of [0,1] such that the length of ij is oti{G n ) / ac n (i — 1, . . . ,n). 

Let us transform W into a symmetric joint distribution W n over V x V. 
The entries Wij = Wij/ac n = l,...,n) embody this discrete joint dis- 
tribution of random variables £ and £' which are identically distributed with 
marginal distribution d±, . . . , d n , where di = cti{G n ) / ac n (i = 1, . . . , n). With 
the previous notation H = L 2 (£), H' = L 2 (£'), the operator P Wn : H' —¥ H 
taking conditional expectation is an integral operator with now discrete kernel 
Kij = ^f-. The fact that ip, ip' is an eigenfunction pair of Pw n with eigenvalue 
A means that 



where = fp'{j) denotes the value of ip or ip' taken on with probability 
di (recall that ip and tp' are identically distributed). The above equation is 
equivalent to 



therefore the vector of coordinates \/diip(i) (i — 1, . . . , n) is a unit-norm eigen- 

vector of the normalized modularity matrix with eigenvalue A (note that the 
normalized modularity spectrum does not depend on the scale of the edge- 
weights, it is the same whether we use W^s or tUj/s as edge- weights). Conse- 
quently, the eigenvalues of the conditional expectation operator are the same 
as the eigenvalues of the normalized modularity matrix, and the possible val- 
ues taken on by the eigenfunctions of the conditional expectation operator are 
the same as the coordinates of the transformed eigenvectors of the normalized 
modularity matrix forming the column vectors of the matrix X* of the optimal 
(k — l)-dimensional representatives, see Section 2 (a). 



1 



E^0') = E^'0X- = a^(<), 

i=i 1=1 tt « tt i 



(7) 
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Let / be a stepwise constant function on [0,1], taking on value ip(i) on Jj. 
Then Var^ = 1 is equivalent to f$ f 2 (x)dx = 1. Let K Gn be the stepwise 
constant graphon defined as K Gn (x, y) = for x e ij and y e With this, 
the eigenvalue-eigenvector equation (7) looks like 

A/(x)= f 1 K Gn (x,y)f(y)dy. 
Jo 

The spectrum of i^o n is the normalized modularity spectrum of G n together 
with countably infinitely many 0's (it is of finite rank, and therefore, trivially 
compact), and because of the convergence of the weighted graph sequence G n , 
in lack of dominant vertices, the sequence of graphons K Gn also converges. 
Indeed, the W Gn —> W convergence in the cut metric means the convergence 
of the induced discrete distributions W n 's to the continuous W. Since K Gn 
and K are so-called copula transformations of those distributions, in lack 
of dominant vertices (this causes the convergence of the margins) they also 
converge, which in turn implies the K Gn — > K convergence in the cut metric. 

Let K denote the limit graphon of K Gn (n — > oo). This will be the kernel of 
the integral operator taking conditional expectation with respect to the joint 
distribution W. It is easy to see that this operator is also a Hilbert-Schmidt 
operator, and therefore, compact. With these considerations the remainder of 
the proof is analogous to the proof of Theorem 6.7 of [13], where the authors 
prove that if the sequence (Wo n ) of graphons converges to the limit graphon 
W, then both ends of the spectra of the integral operators, induced by WqJs 
as kernels, converge to the ends of the spectrum of the integral operator in- 
duced by W as kernel. We apply this argument for the spectra of the integral 
operators induced by the kernels K G Js and K. □ 

Note that in [19], kernel operators are also discussed, but not with our nor- 
malization. 

Remark 9 By Fact 7, provided there are no dominant vertices, Theorem 8 
implies that for any fixed positive integer k, the (k — 1) -tuple of the largest 
absolute value eigenvalues of the normalized modularity matrix is testable. 

Theorem 10 Assume that there are constants < e < 9 < 1 such that the 
normalized modularity spectrum (with decreasing absolute values) of any G n 
satisfies 

1 > > • > >Q> £ > \fl>n,k\ >•••> \fJ>n,n\ = 0. 

With the notions of Theorem 8, and assuming that there are no dominant ver- 
tices of G n 's, the subspace spanned by the transformed eigenvectors D -1 / 2 Ui, 
. . . 7 D _1 / 2 u fc _! belonging to the k — 1 largest absolute value eigenvalues of the 
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normalized modularity matrix ofG n also converges to the corresponding (k—1)- 
dimensional subspace o/Pw- More precisely, if P n ,k-i denotes the projection 
onto the subspace spanned by the transformed eigenvectors belonging to k — 1 
largest absolute value eigenvalues of the normalized modularity matrix of G n , 
and Pfc_i denotes the projection onto the corresponding eig en- sub space of ' Pw, 
then ||P n ,fe-i — Pfc-i|| — > as n — > oo (in spectral norm). 



PROOF. If we apply the convergence fact f/, n j — > /ij(Pw) for indices % — k — 1 
and k, we get that there will be a gap of order 9 — e — o(l) between |/i fc _ 1 (P w )| 
and |/Xfc(Pw)| too. 

Let Pw,n denote the n-rank approximation of Pyy (keeping its n largest absolute 
value eigenvalues, together with the corresponding eigenfunctions) in spectral 
norm. The projection Pfc_i (k < n) operates on the eigen-subspace spanned 
by the eigenfunctions belonging to the k — 1 largest absolute value eigenvalues 
of Pw,n in the same way as on the corresponding (k — l)-dimensional subspace 
determined by Pw- With these considerations, we apply the perturbation the- 
ory of eigen-subspaces with the following unitary invariant norm: the trace- or 
Schatten-norm of the Hilbert-Schmidt operator A is ||A|| tr = (£~i \f(A)) 1/4 . 
Our argument with the finite (k — 1) rank projections is the following. De- 
noting by Pw n the integral operator belonging to the normalized modularity 
matrix of G n (with kernel K Gn introduced in the proof of Theorem 8), 

||Pn,fc-l — Pfc-l|| = ||Pn,k-lPfc-l|l - l|Pn,fc-lPfc-l||tr 
-e- £ - (l) l|PW "" PW '" l|tr 

with constant c that is at most n/2 (Theorem VII. 3. 2 of [4]). But 

l|Pw„ — Pw,n||tr < ||Pw„ — Pw||tr + ||Pw — Pw,n||tr, 

where the last term tends to as n — > oo, since the tail of the spectrum 
(taking the fourth power of the eigenvalues) of a Hilbert-Schmidt operator 
converges. For the convergence of the first term we use Lemma 7.1 of [12], 
which states that the trace-norm of an integral operator can be estimated 
from above by four times the cut norm of the corresponding kernel. But the 
convergence in the cut distance of the corresponding kernels to zero follows 
from the considerations made in the proof of Theorem 8. This finishes the 
proof. □ 

Remark 11 As the k-variance depends continuously on the above subspaces 
(see the expansion (3) of s 2 in the proof of Theorem 3), Theorem 10 implies 
the testability of the k-variance as well. 
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5 Summary 



The above results suggest that in the absence of dominant vertices, even the 
normalized modularity matrix of a smaller part of the underlying weighted 
graph, selected at random with an appropriate procedure, is able to reveal 
its cluster structure. Hence, the gain regarding the computational time of this 
spectral clustering algorithm is twofold: we only use a smaller part of the graph 
and the spectral decomposition of its normalized modularity matrix runs in 
polynomial time in the reduced number of the vertices. Under the vertex- and 
cluster-balance conditions this method can give quite good approximations 
for the multiway cuts and helps us to find the number of clusters and identify 
the cluster structure. In addition, taking into account both the positive and 
negative, large absolute value eigenvalues together with eigenvectors, regular 
cuts can also be detected, as the investigated spectral characteristics give 
good estimates for the volume regularity's constant of the cluster pairs by 
Theorem 3. Such regular cuts are looked for in social or biological networks, 
e.g., if we want to find equally functioning synapses of the brain. 
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